Modelling conditional varianceheterogeneity in parameter design
نویسنده
چکیده
This report is an addendum of the Engel and Huele (1996b) Technometrics paper. It contains a theorem that relates ordinary and weighted least squares estimation in the context of experimental design. It also gives some more background information of the simulation results from this Engel and Huele paper and the S-Plus code of the simulation is explained. keywords: dispersion e ect, iteratively reweighted least squares, simulation, Taguchi methods, variance function estimation. A. Freek Huele is a consultant at IBIS UvA. He is a member of ASQC. Jan Engel is senior consultant at CQM bv. 2 1 Introduction The aim of this report is to give some more details about the Engel and Huele (1996b) approach to modelling the residual variance after tting a response model on experimental data. In this section, rst a short review is given about dispersion modelling in general and second the contents of this report is described. In statistical quality control, there is currently much interest in modelling both the mean and variance of the data. It was the Japanese engineer Genichi Taguchi (1986) who argued that every deviation from target has to be considered as a quality loss. Using a quadratic loss function, he showed that in order to minimize quality loss, a quality characteristic should be on target with as small variation as possible. In the context of experimental design, this attention has lead to interest in modelling both the mean value and the variance of the response, as a function of factors that were used in the experiment. Factors that appear in the model for the variance are said to have a dispersion e ect, whereas when they appear in the model for the mean they have a location e ect. Naturally, one factor can have both type of e ects or none. Factors with a dispersion e ect are called dispersion factors. Dispersion factors can generally be found by the following ways: analyzing a signal-to-noise (S/N) ratio. analyzing interactions between design and noise factors. analyzing the log of the standard deviation at each design point in the case of replicated data. residual analysis for replicated and unreplicated data. In the case of a Taguchi experiment, "quasi" replicated data are obtained by taking replications across the noise array. Therefore, a di erent analysis must be carried out than in the case of true replicates. Taguchi (1986) proposes to analyse an S/N ratio as the response to nd dispersion factors. After the S/N ratio is calculated, ANOVA and a graphical analysis is used to determine the dispersion factors (i.e. those that maximize the S/N ratio). Modelling an S/N ratio, as described by Lucas (1994) will not be discussed here. 3 Analyzing interactions between designand noise factors by means of generalized interaction plots was suggested by Shoemaker and Tsui (1993). This analysis can also be used for classical experiments that include noise factors, c.f. Welch et al. (1990). The proposed analysis of Welch et al. (see also Shoemaker et al. 1991) is based on analyzing quality loss using simulations from the tted model, but this is not considered here. In the case of replicated data, say yi1; : : : ; yir for run i = 1; : : : ; n, the logarithm of the standard deviation si within each run can be used as a response. For example, the following main e ects only model log(s2i ) = 0 + k X j=1xij j + "i can be tted using the same mechanism as when tting a model for the mean value (in which case the sample mean of each design point is used as a response). Here, xij represents the level of design factor j in run i and j is the dispersion e ect corresponding to xij. The logarithm is used to stabilize the variance of the response and to make the distribution of the response more symmetric. Generally, r should be 4 or more to obtain reasonable estimates of the true variance at each design point. A practical reference for dispersion modelling using replicated data is the book by Schmidt, S.R and Launsby R.G. (1980, section 5.4). Notice that when the variance of the response is non-constant, then the model for the mean should be updated by weighted least squares. Signi cance of the ̂j can be judged by any of the following means: pareto analysis, the Wald test, a normal plot of ̂j, likelihood ratio test. Another approach for replicated data, due to Aitkin (1987), is as follows: 1. Fit a saturated model for the mean. 2. Fit a saturated model for dispersion. 3. Use backward elimination to nd a good model for the variance. That is, look at change in deviance of the full model, when the smallest e ect is eliminated. 4 4. Simplify model for the mean, again using backward elimination and weighted least squares. Other references in this area are: Nair and Pregibon (1988) and Grego (1993). In the case of unreplicated data, a saturated model has residuals that are all trivially equal to zero. An alternative is then to t a saturated model for the mean and use a normal plot or the pseudo standard error (Haaland and O'Connell, 1995) to nd the largest e ects. After a reduced model is determined, the residuals can be used to model the variance function. Signi cance of the ̂j can then again be judged by any of the means that were listed for replicated data. Alternatively, the Aitkin approach can be followed and changes in deviance can be compared. Dispersion factors can also be found by calculating the F ratio's of Box and Meyer (1986). Notice that the situation of unreplicated data also applies to response surface analysis of Taguchi experiments. In that case, rst a conditional model is build, assuming that both design and noise factors are xed. Since this step is identical to the well-known response surface approach for analyzing classical experiments, residual analysis is also useful for analyzing Taguchi experiments. Modelling data and optimizing the design factor settings based on such experiments has been discussed in Myers, Khuri and Vining (1992), Engel (1992) and Engel and Huele (1996a,b). The contents of this report is as follows. Section 2 contains some theorems that relate ordinary least squares (OLS) estimation and weighted least squares (WLS) estimation for 2k experiments. The results of a simulation study are discussed in section 3. Details about the S-plus program for this simulation are collected in section 4. 5 2 On the equivalence of OLS and WLS In this section, two theorems are presented that relate OLS to WLS with known or estimated weights in the case of an orthogonal design matrix. Much has already been written about the equality of the ordinary and weighted least squares estimators in the case of a general full rank design matrix; see Seber (1977, p. 63). Here, theorem 1 and theorem 2 are presented in the context of experimental design. Theorem 1 is valid for any non-singular square design matrix and follows also from Seber. Theorem 2 is new and is proved for 2k designs only. Theorem 1 Let Xs be a non-singular (n n) design matrix. Denote the data vector by y =(y1,: : :,yn)0. The saturated model is given by y = Xs , where is a (n 1) parameter vector, with ith element i. Let =diag( 2 1,: : :, 2 n) be a diagonal matrix with var(yi) on the ith diagonal element. Furthermore, let ̂(OLS)=(X 0 sXs) 1X 0 sy and ̂(WLS)= (X 0 s 1Xs) 1X 0 s 1y. In that case: ̂(OLS) = X 1 s y = ̂(WLS) .Proof Since Xs and are both square and non-singular, applying basic rules of inverting products of matrices gives (X 0 sXs) 1X 0 s=X 1 s =(X 0 s 1Xs) 1X 0 s 1. Multiply these terms from the right by y to prove the result. 2 Notice that theorem 1 also applies to ̂(WLS) with estimated weights. In general not all n parameters will be estimated, but only m n. Therefore, without loss of generality, de ne the matrix X containing the rst m columns of Xs. For the reduced model, de ne ̂(OLS)=(X 0X) 1X 0y and ̂(WLS)=(X 0 1X) 1X 0 1y. Furthermore, de ne ̂i(WLS) as the ith element of ̂(WLS) and de ne ĵi(WLS) as the change in ̂i(WLS) due to introducing a new regressor x:j into the model, with x:j=(x1j; : : : ; xnj)0 being the jth column of Xs, j fm+1; : : : ; ng. Furthermore, let ĵi(WLS) be the change of ̂i(WLS) when all regressors up to index j are included in the model. In order to prove theorem 2 the following lemmas are useful. Lemma 1 8i = 1; : : : ; m: ĵi(WLS) = the ith element of the vector ̂j(WLS)(X 0 1X) 1X 0 1x:j 6 Proof Follows directly from Seber (1977, pp. 68-69). 2 Lemma 2 For any orthogonal design: ̂i(WLS) = ̂i(OLS), n̂i(WLS) = 0 Proof Both implications follow from n̂i(OLS) = 0 and theorem 1. 2 Lemma 2 gives a necessary and su cient condition so that ̂i(WLS) = ̂i(OLS). In theorem 2 a su cient condition so that n̂i(WLS) = 0 in the case of a 2k design is proven. Hence, in that case it can be determined before iteratively reweighted least squares has started, for which values of i, ̂i(WLS) will be equal to ̂i(OLS). Lemma 3 De ne < x:i; x:j >= Pnt=1 xtixtj= 2 t and let i j denote "the e ect corresponding to i is aliased with the e ect corresponding to j". Let 6 denote "is not aliased with". Under the assumptions for theorem 2 and by using the same complete de ning relation: i 6 j )< x:i; x:j >= 0 proof i 6 j ) 8r = 1; : : : ; 2p X t ftj 2 t=c(r)g xtixtj = 0; where c(r) is the rth of 2p variances as predicted by the variance model from theorem 2. This implies that: 8r = 1; : : : ; 2p X t ftj 2 t=c(r)g xtixtj=c(r) = 0; so that 2p X r=1 1=c(r) X t ftj 2 t=c(r)g xtixtj = 2p X r=1 X t ftj 2 t=c(r)g xtixtj= 2 t = = n Xt=1 xtixtj= 2 t =< x:i; x:j >= 0 7 2 Remark: The ( proof is more di cult. We then need to proof that i j )< x:i; x:j >6= 0: However, i j ) 8r = 1; : : : ; 2p X t ftj 2 t=c(r)g xtixtj 6= 0 so that < x:i; x:j >6= 0, only if positive and negative terms do not cancel out. This depends on the values of 2 i and thus when estimated, on the data. Theorem 2 Assume that a model with m e ects (intercept included) is selected, based on a 2k experiment. Denote the response vector by y =(y1,: : :,yn)0. The model is denoted by y = X + , where X is a (n m) orthogonal design matrix with elements xij f 1; 1g. and is a (m 1) parameter vector. The residual error is assumed to have expectation zero and variance-covariance matrix = diag( 2 1,: : :, 2 n). Let ̂(OLS)=(X 0X) 1X 0y and ̂(WLS)= (X 0 1X) 1X 0 1y. Furthermore, suppose 2 i is a link-linear function of p m design factors D1; : : : ; Dp. Form the complete de ning relation consisting of D1; : : : ; Dp and their 2p p 1 generalized interactions. Then, for some i f1; : : : ; mg, ̂i(WLS)=̂i(OLS) if the e ect corresponding to i is not aliased with any e ect that is not in the model. Proof Since n̂i(WLS)=Pnj=m+1 ĵi(WLS), it equals zero when for all j"fm+1; : : : ; ng ĵi(WLS) equals zero. Therefore, rst start with a preliminary model that contains m e ects and add a new regressor with index j = m + 1. In order to show when ĵi(WLS) = 0, we apply lemma 1. Therefore, notice that ĵi(WLS) can be written as: ̂j(WLS) m Xl=1 cov[̂i(WLS); ̂l(WLS)] < x:l; x:j > : Note that when ̂j(WLS) = 0, ĵi(WLS) = 0 trivially. A su cient condition for this expression to be equal zero is to set all the m terms that appear in Pml=1 equal to zero. For that purpose, assume that i 6 j. 8 For l = i, using lemma 3 we obtain that < x:i; x:j >= 0. For l 6= i, there are two possibilities: either l i or l 6 i. If l i then, using lemma 3 again, l i ) l 6 j )< x:l; x:j >= 0: If l 6 i, then the following property of the scaled design matrix (X 0 1=2)0 is used. The columns of this matrix can be placed in groups, according to the aforementioned de ning relation. This means that those columns that are aliased are in one group and those that are not aliased are in one of the other groups. The matrix (X 0 1X) is then equal to a partitioned diagonal matrix with square, non singular, matrices centered around the main diagonal and zero's elsewhere. If the ith and lth column of (X 0 1=2)0 are in di erent groups, then the (i; l)th element of both (X 0 1X) and (X 0 1X) 1 equals zero. Hence, we obtain that cov[̂i(WLS); ̂l(WLS)]=0. Combining all arguments, it can now be concluded that: i 6 j ) ĵi(WLS) = 0; where j = m+1. Now, x:m+1 is added to the model, and j is xed at m+2. Notice, however, that i 6 m+1 ) cov[̂i(WLS); ̂m+1(WLS)]=0 so that all that is needed is that i is not aliased with m+2. Continuing this process until j = n, we have obtained that n̂i(WLS) = 0, under the condition that i is not aliased with any j for j fm+ 1; : : : ; ng. The theorem is proved by applying lemma 2. 2 In this proof, it is su cient to know when the entries of the variance covariance matrix (X 0 1X) 1 are zero. The( proof is more di cult. The( proof of lemma 3 is needed and more details need to be known of (X 0 1X) 1. 9 3 Monte Carlo simulation In the Engel and Huele Technometrics paper, the results of a simulation study were described. The aim of this study is to determine the e ect of the number of weighted least squares cycles on the parameter estimates of a tted model. In this section, some more details about this simulation are given. The simulation is based on the following response model: yi = i + "i; with the errors "i N(0; 2 i ) distributed. A complete 25 factorial design matrix is taken. Let xi denote the (1 5) vector of the levels of the 5 design factors in run i, i = 1; : : : ; 32. The conditional mean is taken linear in the the main e ects and interactions among the 5 design factors, in vector notation denoted by g(xi). i = 0 + g0(xi) ; here 0 is a constant and is a parameter vector. The coe cients in the model for the mean are chosen as follows: 0=4 and the main-e ects are 1=1.5, 2=0, 3=-2, 4=1 and 5=-0.75. The two-factor interactions are taken as 13=-2.50, 14=-0.25, and 34=0.50, all other e ects are taken zero. The dispersion model is given by 2 i=exp f 0 + (x0i )g. The dispersion constant is introduced to determine the amount of variance heterogeneity. The intercept 0=1 and the vector is given by 1=-1, 2=-0.5, 3=0, 4=1 and 5=2. The simulation study is set up according to a complete 33 design. The factors and levels that are considered are listed in Table 3.1. levels factor description -1 0 1 C number of cycles 0 2 4 variance heterogeneity 0.0 0.3 0.6 df degrees of freedom in residuals 6 16 26 Table 3.1: Factors and levels in the Monte Carlo experiment. The bias and mean squared error (MSE) of the estimates ̂ and ̂ are used as responses. Notice that these measures for ̂j depend on the model for the mean only through the number of terms that is included in that model. So, the values that are assumed for the 's are not important. Three levels are used for each factor because it is expected that the main e ects with respect to the MSE will be quadratic. 10 Note that in practice the factor can not be controlled by the data analyst; the factor df can be controlled to some extend by replicating the data at the experiment. The factor C can be easily controlled. The levels of the factor df depend on the number of terms in the tted model for the mean; main e ects only, main e ects and all two-factor interactions and the latter with all three factor interactions included. For computational e ciency, the logarithm method was used for the simulation, cf. Engel and Huele (1996b). The amount of variance heterogeneity that is introduced by the factor is as follows. For =0, 2 is a constant (2.72). For =0.3 2 is in the interval [0.70,10.49] and for =0.6 it is in the interval [0.18,40.45]. Each treatment is simulated 1000 times. The complete simulation study has 3 replicates to improve the accuracy of the estimates of the responses. A summary of the e ects observed in this simulation study is shown in Table 3.2. main e ects interactions E(̂) MSE(̂) , df C* *df E( ̂), 6= 0 C C* MSE( ̂) C, df C*df Table 3.2 The observed effects in the Monte Carlo experiment. Firstly, the vector is considered. Since it is a (weighted) least squares property that E(̂)= is a constant, none of the factors in Table 3.1 has an e ect on the bias of ̂ (which is zero). For the MSE of ̂, the main e ects of and df are large and the main e ect of C is small. However, more important than these main e ects is the large three factor interaction between C, and df . In Figure 3.1, this three factor interaction for MSE(̂1) is shown as an illustration, but this interaction is important for all ̂i. Apparently, the levels of and df are important for the e ect of C on the MSE(̂1). If equals 0 or 0.3, then the MSE at C=0 is at most equal to, but often smaller than, the MSE at the other values of C (independent of the level of df). This ine ciency of continuous iteration under moderate variance heterogeneity was also noted by Hooper (1993, p. 179). Notice, however, that when df=26 and =0.6, C=0 is much less e cient than C=2. 11 cycles[df == -1] m ea n of M S E b et ah at _1 0. 05 0. 10 0. 15 0. 20 0. 25 0. 30 0. 35 -1 0 1 alpha[df == -1] 1 0 -1 cycles[df == 0] m ea n of M S E b et ah at _1 0. 05 0. 10 0. 15 0. 20 0. 25 0. 30 0. 35 -1 0 1 alpha[df == 0] 1 0 -1 cycles[df == 1] m ea n of M S E b et ah at _1 0. 05 0. 10 0. 15 0. 20 0. 25 0. 30 0. 35 -1 0 1 alpha[df == 1] -1 0 1 Figure 3.1. MSE(̂1): three factor interaction of C, and df . 12 Obviously, iterating with estimated weights is only e cient under large variance heterogeneity with weights that are estimated accurately, i.e. the weights are based on residuals with enough degrees of freedom. In other cases the OLS estimator of is preferred. At C=0, var(̂)=(X 0X) 1X 0 X(X 0X) 1, generally. When (X 0X) 1 = (1=n)In we have var(̂)=X 0 X=n2. Since =diag( 2 1; : : : ; 2 n), var(̂) depends on . It follows also from these formulae that as increases then var(̂i) increases as well. Empirically: look at Table 4.1 and move horizontally through the MSE row (at C=0). Moreover, at C=0, var(̂i)= P 2 i =n2 is independent of i. Looking at Table 4.1, this can be observed empirically as well. For xed, going down in the MSE column (at C=0) it is easily seen that var(̂i) is independent of i. Notice also that cov(̂i, ̂j) 6= 0, generally. For C >0 only asymptotic results are known. From Carroll and Ruppert (1988, pp. 11-15), it follows that ̂(WLS) is consistent and asymptotically normal distributed with a variance-covariance matrix dependent on X, , C and n. Secondly, the behaviour of ̂ is discussed. Contrary to the results of ̂, the bias of ̂ is not constantly zero but depends on the true value of the elements of the vector . When the true j equals zero, then none of the factors has an e ect on the bias of ̂j, which appears to be zero in this case. This means that parameters j that are zero in reality will not have an increased probability to be included in the tted model, because of bias. When the true j is non-zero, the factor C has a large linear e ect on the bias. The sign of the slope is equal to the sign of j and the bias seems zero around C=2. The rate of decrease (increase) depends on the size of j j j: the larger j j j, the larger the slope. The main e ects of and df are small as compared to the main e ect of C. See Figure 2 for an illustration. 13 cycles m ea n of b ia s ga m m ah at _1 -0 .4 -0 .2 0. 0 0. 2 0. 4 -1 0 1 id 1 Figure 2. Bias( ̂1): main effect of C. cycles m ea n of b ia s ga m m ah at _1 -0 .4 -0 .2 0. 0 0. 2 0. 4 -1 0 1 alpha -1 0 1 Figure 3. Bias( ̂1): two factor interaction of C and . 14 cycles m ea n of M S E g am m ah at _1 0 2 4 6 8 10
منابع مشابه
A Robust Reliable Forward-reverse Supply Chain Network Design Model under Parameter and Disruption Uncertainties
Social responsibility is a key factor that could result in success and achieving great benefits for supply chains. Responsiveness and reliability are important social responsibility measures for consumers and all stakeholders that strategists and company managers should be concerned about them in long-term planning horizon. Although, presence of uncertainties as an intrinsic part of supply chai...
متن کاملTsallis Entropy and Conditional Tsallis Entropy of Fuzzy Partitions
The purpose of this study is to define the concepts of Tsallis entropy and conditional Tsallis entropy of fuzzy partitions and to obtain some results concerning this kind entropy. We show that the Tsallis entropy of fuzzy partitions has the subadditivity and concavity properties. We study this information measure under the refinement and zero mode subset relations. We check the chain rules for ...
متن کاملA Preferred Definition of Conditional Rényi Entropy
The Rényi entropy is a generalization of Shannon entropy to a one-parameter family of entropies. Tsallis entropy too is a generalization of Shannon entropy. The measure for Tsallis entropy is non-logarithmic. After the introduction of Shannon entropy , the conditional Shannon entropy was derived and its properties became known. Also, for Tsallis entropy, the conditional entropy was introduced a...
متن کاملWorm plot: a simple diagnostic device for modelling growth reference curves.
The worm plot visualizes differences between two distributions, conditional on the values of a covariate. Though the worm plot is a general diagnostic tool for the analysis of residuals, this paper focuses on an application in constructing growth reference curves, where the covariate of interest is age. The LMS model of Cole and Green is used to construct reference curves in the Fourth Dutch Gr...
متن کاملThe effect of parameter estimation on Phase II control chart performance in monitoring financial GARCH processes with contaminated data
The application of control charts for monitoring financial processes has received a greater focus after recent global crisis. The Generelized AutoRegressive Conditional Heteroskedasticity (GARCH) time series model is widely applied for modelling financial processes. Therefore, traditional Shewhart control chart is developed to monitor GARCH processes. There are some difficulties in financial su...
متن کاملNormal Mixture Garch(1,1): Applications to Exchange Rate Modelling
Some recent specifications for GARCH error processes explicitly assume a conditional variance that is generated by a mixture of normal components, albeit with some parameter restrictions. This paper analyses the general normal mixture GARCH(1,1) model which can capture time variation in both conditional skewness and kurtosis. A main focus of the paper is to provide evidence that, for modelling ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996